AITopics | training signal

Collaborating Authors

training signal

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Masked Generative Adversarial Networks are Data-Efficient Generation Learners Supplemental Materials

Neural Information Processing SystemsApr-24-2026, 14:15:19 GMT

Prior studies [18, 12] show that GAN often experiences generation failures with severely degraded generation performance when only limited training data is available. Specifically, with limited training data, the discriminator tends to discriminate via meaningless shortcuts by merely focusing on easy-to-discriminate image locations and spectra instead of holistic understanding of images. This can be viewed clearly in Figure 1, where the Gini Coefficient [4] of discriminator's spatial attentions increases quickly along the training iteration (when only limited training data is available). Note that the Gini coefficient [4] is negatively correlated with equality, i.e., the discriminator will pay more unevenly distributed attention to each spatial location while the Gini coefficient increases from '0' to '1'. For image generation with GAN, the large Gini coefficient (of discriminator's spatial attentions) thus means that the discriminator starts to focus on certain spatial locations (easy to discriminate) while ignoring other spatial locations (hard to discriminate), ultimately leading to an over-confident discriminator and training collapse. In another word, the Gini coefficient [4] of '0' expresses perfect equality where all values are the same (i.e., where the discriminator pays the same attention to every spatial location) while '1' expresses maximal inequality among values (i.e., the discriminator focuses on only one location while all others are ignored).

artificial intelligence, machine learning, maskedgan, (16 more...)

Neural Information Processing Systems

Country: Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Catch-A-Waveform: LearningtoGenerateAudio fromaSingleShortExample

Neural Information Processing SystemsFeb-10-2026, 16:44:47 GMT

Oncetrained,ourmodelcangeneraterandom samples of arbitrary duration that maintain semantic similarity to the training waveform, yet exhibit new compositions of its audio primitives.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

0efcb1885b8534109f95ca82a5319d25-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 11:34:11 GMT

discriminator, maskedgan, spatial, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Germany (0.04)
Asia > Singapore (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

g-DPO: Scalable Preference Optimization for Protein Language Models

Ferragu, Constance, Ziegler, Jonathan D., Deutschmann, Nicolas, Lindoulsi, Arthur, Bixby, Eli, Team, Cradle ML

arXiv.org Artificial IntelligenceNov-27-2025

Direct Preference Optimization (DPO) is an effective approach for aligning protein language models with experimental design goals. However, DPO faces a scalability bottleneck: the number of possible training pairs grows quadratically with the number of labeled sequences, leading to prohibitive training times even for modestly sized datasets. We introduce g-DPO, a framework that (i) uses sequence space clustering to prune redundant pairs while preserving training signal, and (ii) amortizes likelihood computations with group-based approximations. Across three protein engineering tasks, g-DPO maintains in silico and in vitro performance that is statistically indistinguishable from standard DPO, while converging 1.7x to 5.4x times faster, with speedups that scale with dataset size and the structure of the underlying mutational landscape.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2510.19474

Country: Europe > Switzerland (0.28)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Leveraging Reinforcement Learning, Genetic Algorithms and Transformers for background determination in particle physics

Mendizabal, Guillermo Hijano, Lancierini, Davide, Marshall, Alex, Mauri, Andrea, Owen, Patrick Haworth, Patel, Mitesh, Petridis, Konstantinos, Qasim, Shah Rukh, Serra, Nicola, Sutcliffe, William, Tilquin, Hanae

arXiv.org Artificial IntelligenceNov-21-2025

Experimental studies of beauty hadron decays face significant challenges due to a wide range of backgrounds arising from the numerous possible decay channels with similar final states. For a particular signal decay, the process for ascertaining the most relevant background processes necessitates a detailed analysis of final state particles, potential misidentifications, and kinematic overlaps, which, due to computational limitations, is restricted to the simulation of only the most relevant backgrounds. Moreover, this process typically relies on the physicist's intuition and expertise, as no systematic method exists. This paper has two primary goals. First, from a particle physics perspective, we present a novel approach that utilises Reinforcement Learning (RL) to overcome the aforementioned challenges by systematically determining the critical backgrounds affecting beauty hadron decay measurements. While beauty hadron physics serves as the case study in this work, the proposed strategy is broadly adaptable to other types of particle physics measurements. Second, from a Machine Learning perspective, we introduce a novel algorithm which exploits the synergy between RL and Genetic Algorithms (GAs) for environments with highly sparse rewards and a large trajectory space. This strategy leverages GAs to efficiently explore the trajectory space and identify successful trajectories, which are used to guide the RL agent's training. Our method also incorporates a transformer architecture for the RL agent to handle token sequences representing decays.

background, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.14894

Country:

North America > United States (0.93)
Europe (0.92)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models

Liu, Chenxi, Liang, Junjie, Jia, Yuqi, Cao, Bochuan, Bai, Yang, Huang, Heng, Chen, Xun

arXiv.org Artificial IntelligenceNov-10-2025

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as an effective approach for improving the reasoning abilities of large language models (LLMs). The Group Relative Policy Optimization (GRPO) family has demonstrated strong performance in training LLMs with RLVR. However, as models train longer and scale larger, more training prompts become residual prompts, those with zero variance rewards that provide no training signal. Consequently, fewer prompts contribute to training, reducing diversity and hindering effectiveness. To fully exploit these residual prompts, we propose the Explore Residual Prompts in Policy Optimization (ERPO) framework, which encourages exploration on residual prompts and reactivates their training signals. ERPO maintains a history tracker for each prompt and adaptively increases the sampling temperature for residual prompts that previously produced all correct responses. This encourages the model to generate more diverse reasoning traces, introducing incorrect responses that revive training signals. Empirical results on the Qwen2.5 series demonstrate that ERPO consistently surpasses strong baselines across multiple mathematical reasoning benchmarks.

large language model, machine learning, residual prompt, (19 more...)

arXiv.org Artificial Intelligence

2511.048

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning Beyond Experience: Generalizing to Unseen State Space with Reservoir Computing

Norton, Declan A., Zhang, Yuanzhao, Girvan, Michelle

arXiv.org Artificial IntelligenceOct-30-2025

Machine learning techniques offer an effective approach to modeling dynamical systems solely from observed data. However, without explicit structural priors -- built-in assumptions about the underlying dynamics -- these techniques typically struggle to generalize to aspects of the dynamics that are poorly represented in the training data. Here, we demonstrate that reservoir computing -- a simple, efficient, and versatile machine learning framework often used for data-driven modeling of dynamical systems -- can generalize to unexplored regions of state space without explicit structural priors. First, we describe a multiple-trajectory training scheme for reservoir computers that supports training across a collection of disjoint time series, enabling effective use of available training data. Then, applying this training scheme to multistable dynamical systems, we show that RCs trained on trajectories from a single basin of attraction can achieve out-of-domain generalization by capturing system behavior in entirely unobserved basins.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1063/5.0283421

2506.05292

Country: North America > United States > Maryland (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Energy (0.93)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse

Zhang, Yuheng, Yao, Wenlin, Yu, Changlong, Liu, Yao, Yin, Qingyu, Yin, Bing, Yun, Hyokun, Li, Lihong

arXiv.org Artificial IntelligenceOct-1-2025

Large language models (LLMs) have achieved impressive reasoning performance, with reinforcement learning with verifiable rewards (RLVR) emerging as a standard paradigm for post-training. A representative algorithm, group relative policy optimization (GRPO) (Shao et al., 2024), computes advantages by normalizing outcome rewards within response groups, but suffers from a vanishing advantage issue when all responses in a group receive identical rewards. To address this issue, we propose Adaptive Rollout and Response Reuse Policy Optimization (AR3PO), a sampling efficient RLVR algorithm that introduces two novel techniques: adaptive rollout, which dynamically allocates more responses to difficult prompts while saving computation on easier ones, and response reuse, which leverages previously generated correct responses to provide useful training signals. We compare AR3PO with strong RLVR baselines on multiple representative benchmarks using two different families of base models. Across the 7B and 8B models, AR3PO consistently outperforms GRPO and matches or surpasses DAPO (Yu et al., 2025), reducing rollout cost by up to 4.2x. On the larger 32B model, AR3PO achieves comparable performance to DAPO at similar training steps while maintaining substantially lower rollout cost.

arxiv preprint arxiv, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2509.25808

Genre: Research Report (0.84)

Technology: